A Lower Bound on the Euclidean Distance for Fast Nearest Neighbor Retrieval in High-dimensional Spaces

نویسندگان

  • George Saon
  • Peder Olsen
چکیده

Finding the nearest neighbor among a large collection of high dimensional vectors can be a computationally demanding task. In this paper, we pursue fast vector matching by representing vectors in IRn with lower dimensional projections in IR, m ≤ n. The key to creating and using the representative vectors is a lower bound on the Euclidean distance between arbitrary vectors in IRn based on the submultiplicative property of induced matrix norms. For any non-zero projection matrix A ∈ IR, the bound is proportional to the distance between the projected vectors. We study other existing bounds involving orthogonal transforms and piecewise constant approximation maps in light of this formulation. Additionally, we address the question of how to optimize the projection matrix given a dataset in order to make the bound as tight as possible. Experimental results on a speech database show that exact nearest neighbor computation can be accelerated by a factor of 5 using the proposed bound.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Classification, with Applications to Object and Shape Recognition in Image Databases

Nearest neighbor retrieval is the task of identifying, given a database of objects and a query object, the objects in the database that are the most similar to the query. Retrieving nearest neighbors is a necessary component of many practical applications, in fields as diverse as computer vision, pattern recognition, multimedia databases, bioinformatics, and computer networks. At the same time,...

متن کامل

Signi cance-Sensitive Nearest-Neighbor Search for E cient Similarity Retrieval of Multimedia Information

Nearest-neighbor search (NN-search) in the feature space is widely used for the similarity retrieval of multimedia information. Each piece of multimedia information is mapped to a vector in a multi-dimensional space where the distance between two vectors (typically, Euclidean distance between the heads of vectors) corresponds to the similarity of multimedia information. Once the feature space i...

متن کامل

Metric-Based Shape Retrieval in Large Databases

This paper examines the problem of database organization and retrieval based on computing metric pairwise distances. A low-dimensional Euclidean approximation of a high-dimensional metric space is not efficient, while search in a high-dimensional Euclidean space suffers from the “curse of dimensionality”. Thus, techniques designed for searching metric spaces must be used. We evaluate several su...

متن کامل

PAC Nearest Neighbor Queries: Approximate and Controlled Search in High-Dimensional and Metric Spaces

In high-dimensional and complex metric spaces, determining the nearest neighbor (NN) of a query object q can be a very expensive task, because of the poor partitioning operated by index structures – the so-called “curse of dimensionality”. This also affects approximately correct (AC) algorithms, which return as result a point whose distance from q is less than (1 + ) times the distance between ...

متن کامل

Nearest Neighbor Searching in Image Databases

iii Abstract A frequently encountered type of query in image database systems is to nd the k most similar images to a query image with respect to its feature. Processing such queries requires substantially diierent search algorithms than those for the normal k nearest neighbor problem: dimensionality of the feature may be very high and similarity measure may not be as simple as a Euclidean dist...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009